159 research outputs found
Combining Stream Mining and Neural Networks for Short Term Delay Prediction
The systems monitoring the location of public transport vehicles rely on
wireless transmission. The location readings from GPS-based devices are
received with some latency caused by periodical data transmission and temporal
problems preventing data transmission. This negatively affects identification
of delayed vehicles. The primary objective of the work is to propose short term
hybrid delay prediction method. The method relies on adaptive selection of
Hoeffding trees, being stream classification technique and multilayer
perceptrons. In this way, the hybrid method proposed in this study provides
anytime predictions and eliminates the need to collect extensive training data
before any predictions can be made. Moreover, the use of neural networks
increases the accuracy of the predictions compared with the use of Hoeffding
trees only
Fairness-enhancing interventions in stream classification
The wide spread usage of automated data-driven decision support systems has
raised a lot of concerns regarding accountability and fairness of the employed
models in the absence of human supervision. Existing fairness-aware approaches
tackle fairness as a batch learning problem and aim at learning a fair model
which can then be applied to future instances of the problem. In many
applications, however, the data comes sequentially and its characteristics
might evolve with time. In such a setting, it is counter-intuitive to "fix" a
(fair) model over the data stream as changes in the data might incur changes in
the underlying model therefore, affecting its fairness. In this work, we
propose fairness-enhancing interventions that modify the input data so that the
outcome of any stream classifier applied to that data will be fair. Experiments
on real and synthetic data show that our approach achieves good predictive
performance and low discrimination scores over the course of the stream.Comment: 15 pages, 7 figures. To appear in the proceedings of 30th
International Conference on Database and Expert Systems Applications, Linz,
Austria August 26 - 29, 201
Exploiting a Stimuli Encoding Scheme of Spiking Neural Networks for Stream Learning
Stream data processing has gained progressive momentum with the arriving of new stream applications and big data scenarios. One of the most promising techniques in stream learn- ing is the Spiking Neural Network, and some of them use an interesting population encod- ing scheme to transform the incoming stimuli into spikes. This study sheds lights on the key issue of this encoding scheme, the Gaussian receptive fields, and focuses on applying them as a pre-processing technique to any dataset in order to gain representativeness, and to boost the predictive performance of the stream learning methods. Experiments with synthetic and real data sets are presented, and lead to confirm that our approach can be applied successfully as a general pre-processing technique in many real cases
A Survey on Concept Drift Adaptation
Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art
Efficient estimation of AUC in a sliding window
In many applications, monitoring area under the ROC curve (AUC) in a sliding
window over a data stream is a natural way of detecting changes in the system.
The drawback is that computing AUC in a sliding window is expensive, especially
if the window size is large and the data flow is significant.
In this paper we propose a scheme for maintaining an approximate AUC in a
sliding window of length . More specifically, we propose an algorithm that,
given , estimates AUC within , and can maintain this
estimate in time, per update, as the window slides.
This provides a speed-up over the exact computation of AUC, which requires
time, per update. The speed-up becomes more significant as the size of
the window increases. Our estimate is based on grouping the data points
together, and using these groups to calculate AUC. The grouping is designed
carefully such that () the groups are small enough, so that the error stays
small, () the number of groups is small, so that enumerating them is not
expensive, and () the definition is flexible enough so that we can
maintain the groups efficiently.
Our experimental evaluation demonstrates that the average approximation error
in practice is much smaller than the approximation guarantee ,
and that we can achieve significant speed-ups with only a modest sacrifice in
accuracy
Improving adaptive bagging methods for evolving data streams
We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of different sizes, and ADWIN Bagging uses ADWIN as a change detector to decide when to discard underperforming ensemble members. We improve ADWIN Bagging using Hoeffding Adaptive Trees, trees that can adaptively learn from data streams that change over time. To speed up the time for adapting to change of Adaptive-Size Hoeffding Tree (ASHT) Bagging, we add an error change detector for each classifier. We test our improvements by performing an evaluation study on synthetic and real-world datasets comprising up to ten million examples
On the performance of deep learning models for time series classification in streaming
Processing data streams arriving at high speed requires the development of
models that can provide fast and accurate predictions. Although deep neural
networks are the state-of-the-art for many machine learning tasks, their
performance in real-time data streaming scenarios is a research area that has
not yet been fully addressed. Nevertheless, there have been recent efforts to
adapt complex deep learning models for streaming tasks by reducing their
processing rate. The design of the asynchronous dual-pipeline deep learning
framework allows to predict over incoming instances and update the model
simultaneously using two separate layers. The aim of this work is to assess the
performance of different types of deep architectures for data streaming
classification using this framework. We evaluate models such as multi-layer
perceptrons, recurrent, convolutional and temporal convolutional neural
networks over several time-series datasets that are simulated as streams. The
obtained results indicate that convolutional architectures achieve a higher
performance in terms of accuracy and efficiency.Comment: Paper submitted to the 15th International Conference on Soft
Computing Models in Industrial and Environmental Applications (SOCO 2020
- …